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Introduction 

This  paper  explores  the  use  of  abstract  data  types  as  a modularization  end 
structuring  technique  in  the  design  of  programs,  particularly  larg**‘  programs  Including 
compilers  and  operating  systems.  The  emphasis  is  on  design.  The  particular  language 
in  which  the  program  will  be  coded  should  be  of  relatively  little  importance  In  the 
design  process.  We  are  interested  in  a technique  for  constructing  specifications  which 
cm  be  converted  at  another  time  (and  possibly  by  others)  into  an  implementation. 
Programming  languages  which  provide  data  type  facilities  to  varying  degrees  have 
been  in  existence  since  about  1967  when  SIMULA-67  [1]  appeared.  Since  then,  there 
has  been  an  ever  increasing  number  of  such  languages,  notably  ALGOL-68  [7], 
PASCAL  [8],  and  ELI  [6].  There  are  more  advanced  languages  currently  under 
development  or  proposed,  including  ALPHARD  [9],  CLU  [33,  ar*d  PASCUAL[5}.  A 
useful  bibliography  of  papers  concerned  with  data  types  and  programming  languages 
may  be  found  in  [5} 

Abstraction  refers  to  the  mental  process  which,  when  confronted  with  a set  of 
objects,  can  distinguish  some  objects  from  others  on  the  basis  of  what  they  have  in 
common.  What  is  "abstracted  out"  is  a unifying  theme,  called  the  type  of  those  objects. 
To  pick  a conventional  example,  the  set  { 7,  1.35,  -6,  ’A’,  0.3E6  } contains  three  types 
of  entities:  integers,  reals,  and  characters.  These  are  types  which  just  about  every 
programming  language  provides.  More  abstractly,  the  set  contains  only  two  types: 
numbers  and  characters.  Even  more  abstractly,  it  contains  only  one  type  - constant. 
When  we  prefix  "data  type"  with  "abstract"  we  mean  an  arbitrarily  complex  type  which 
characterizes  a certain  kind  of  behavior. 

The  concept  of  using  abstractions  in  programs  to  control  complexity  is  certainly 


3 


; 


I 


not  a new  one.  For  very  small  programs  it  is  sufficiort  to  use  a straightforward 
control  flow  through  the  program  statements.  Slightly  larger  to  medium  sized 
programs  require  the  use  of  procedures  to  group  individual  statements  in  a logical 
manner,  while  a main  program  directs  the  flow  of  control  among  the  various  top-level 
procedures.  When  dealing  with  large  programs  it  becomes  necessary  to  group  the 
individual  procedures  into  modules.  The  process  by  which  this  should  be  done  is  not 
yet  universally  understood  This  paper  expresses  the  view  that  the  use  of  data  types 
as  a modularization  principle  is  the  right  approach. 

We  will  begin  by  discussing  what  it  means  to  define  a new  data  type  and  will 
describe  some  basic  data  structuring  mechanisms.  Numerous  examples  are  given  to 
familiarize  the  reader  with  the  notation,  building  up  to  the  final  example  involving  tnb 
resign  of  a directory  system. 

1)  Type  Definitions 

A type  definition  has  two  basic  attributes: 

1)  functional  specification:  the  behavior  of  a particular  class  of  objects  as 
seen  by  users  of  that  class. 

2)  algorithmic  specification:  the  means  by  which  that  behavior  Is 
accomplished. 

The  distinction  between  functional  and  algorithmic  specification  is  most 
important.  First,  it  allows  programmers  to  construct  implementations  In  parallel  with 
one  another  without  any  more  information  in  common  than  their  mutual  interfaces. 
Second,  it  ensures  that  a change  in  data  structure  or  algorithm  is  localized  as  long  as 
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the  functional  specifications  remain  unchanged.  A type  definition  is  really  a module  in 


t sense  of  the  information  boundaries  discussed  by  Parnas  [4J.  It  provides  a clear 


design  procedure  for  the  modularization  of  a large  program  or  system,  where  one  has 


been  previously  lacking. 


Language  defined  types  such  as  integer,  real,  and  boolean  are  called 


scalar  tyj:es.  The  term  scalar  is  intended  to  signify  that  values  of  such  a type  are 


considered  to  be  indivisible.  The  scalar  type  integer  characterizes  single  values  taken 


from  the  range  {-n,...,n},  in  conjunction  with  a set  of  operations  which  may  be  applied 


to  integers,  such  as  addition,  subtraction,  multiplication,  etc.  As  far  as  a user  of  type 


integer  is  concerned,  he  should  not  know  about,  and  more  importantly  not  rely  on,  the 


way  in  which  integers  are  represented  or  the  particular  algorithms  which  implement 


integer  arithmetic  on  his  computer.  Adherng  to  this  principle  makes  program 


portability  possible,  since  different  machines  may  use  different  algorithms.  We 


generalize  these  notions  to  abstract  data  types  and  say  that  a ,ype  definition  consists 


1)  a representation  for  objects  of  the  t/pe,  known  only  to  the  definition 
itself  (algorithmic  specification). 


2)  a set  of  operations  which  manipulate  this  representation,  the  names 
and  calling  sequences  of  which  are  known  to  users  of  the  type 
(functional  specification),  but  the  algorithms  of  which  are  known  only 
to  the  definition  itself  (algorithmic  specification). 


Tl.o  binding  of  operations  to  particular  types  is  vital  to  the  description  of 


behavior.  It  is  not  enough  to  present  a picture  of  a control  block  as  the  description  of 


a "process"  in  an  operating  system.  What  is  necessary  is  an  explanation  of  exactly 
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what  can  be  done  to  a process,  and  specifically  not  how  it  is  done.  In  fact  control 
block  formats  should  not  be  disseminated  at  all;  rather  only  a description  of  valid 
operations  which  can  be  used  on  that  data  structure  should  be  provided  to  users.  One 
of  the  most  important  things  gained  from  this  is  the  "sanctity"  of  a data  structure. 
Every  data  structure  may  be  viewed  as  having  an  invariant  relation  which  It  is 
characterized  by  (e.g.  the  mathematical  notions  of  "list"  and  "tree").  It  is  easier  to 
verify  the  Invariance  of  this  relation  if  the  relevant  changes  to  the  structure  are  made 
by  a small  number  of  "privileged"  procedures.  If  you  parcel  out  a data  structure 
address  and  the  right  to  modify  arbitrary  fields  in  arbitrary  ways,  it  is  possible  (likely, 
in  a brand  new  program)  to  find  inconsistencies  in  the  structure  due  to  program  bugs 
anywhere  in  the  entire  system.  If  all  the  information  that  is  distributed  is  operation 
names,  the  scope  of  possible  sources  of  error  is  greatly  reduced  to  the  implementation 
of  those  very  operations. 

The  next  sections  prosent  a set  of  tools  useful  for  the  description  of  data 
abstractions,  and  introduce  some  relatively  Intuitive  syntax. 

2)  Enumeration  Types 

Often  in  the  course  of  writing  a program  in  a standard  language,  a programmer 
will  find  the  need  to  express  values  of  scalar  types  which  are  not  provided  by  the 
language.  For  example,1  one  might  find  an  inventory  program  which,  in  order  to 
identify  the  color  of  an  object,  uses  integers  to  represent  red“l,  yellow-2,  etc.  When 
designing  this  program,  it  is  not  relevant  to  define  the  mapping  between  colors  and 
integers,  since  this  is  merely  an  implementation  detail.  A mechanism  is  needed  for 
defining  a new  type  by  exhibiting  the  set  of  constants  which  comprise  its  values.  A 
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type  so  defined  is  called  an  enume  ation  type,  the  concept  having  been  taken  from 
PASCAL.  An  example  of  the  definition  of  enumeration  types  might  be: 

type  color  - {red, yellow, blue,green}  end; 

type  weekday  - {monday.tuesday, Wednesday, thursday.friday}  end; 

In  order  to  declare  variables  as  instances  of  particular  types,  we  will  use  syntax 
of  the  form: 

declare  payday, dayoffiweekday,  shade-.color; 

The  declaration  states  that  payday  and  dayoff  are  variables  which  may  only 
possess  values  appropriate  to  type  weekday,  and  that  shade  can  only  possess  a value 
appropriate  to  type  color.  These  variables  behave  as  do  more  common  scalar  variables 
with  respect  to  their  appropriate  constants.  For  example, 

payday  «-  Wednesday; 

would  be  valid,  but 

payday  ♦-  red; 

would  be  meaningless.  Operations  are  defined  for  enumeration  types  as  for  example, 

I 

L . — 
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typo  color  - (red,yellow,blue,greenj 

op  complement(c:color):color  - 
case  c of 
begin 

red:  green; 

yellow:  blue; 

green:  red; 

blue:  yellow 

end 

end; 


The  operations  provided  by  a type  are  declared  inside  the  scope  of  the 
definition  in  order  to  exhibit  modularity.  The  brrckets  type  and  end  delimit  the 
definition.  To  use  one  of  these  operations  in  a program  description,  it  is  necessary  to 
prefix  the  operation  name  with  the  type  name  in  order  to  permit  several  types  to  have 
operations  with  the  same  name  (and  for  readability).  The  complement  operation  takes 
a single  parameter  of  type  color  and  returns  a value  of  the  same  type.  For  example,  If 
the  statements 


shade  «-green; 

shade«-color.complement(shade); 


were  executed,  shade  would  obtain  the  value  red.  Even  though  in  most  langi  ages  an 
implementation  of  these  types  will  require  the  use  of  Integers,  nevertheless  tho 
irrelevance  of  the  mapping  may  be  preserved  by  declaring  macros  (Or  variables)  for 
use  in  place  of  the  integer  constants. 


r 
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3)  Composite  Typos 

Given  the  ability  to  define  new  scalar  types,  we  must  now  be  able  to  build  upon 
those  definitions  to  define  more  interesting  types 

Consider  as  an  example  the  definition  of  the  type  complex.  We  w.sh  to 
represent  numbers  in  the  complex  plane  and  provide  arithmetic  operations  on  them. 
We  neud  to  specify  to  u»e:s  that  Ihe  operations 

ada(w,z:compl«x)  complex 
muKw.zicomplex): -omplex 
create(a,b:real):complex 

will  be  available.  Add  will  take  two  complex  values  and  produce  tho  complex  sum.  Mul 
will  behave  analogously.  Create  will  form  a new  complex  value  equal  to  a+bl.  In  terms 
of  algorithmic  specification,  we  can  represent  a complex  value  by  its  real  and  Imaginary 
parts.  Consider  the  follov  ing  definition: 

type  complex  - 
declare  r,i:real; 

op  add(w,z:complex):complex  - 
begin 

declare  sumicomplex; 

sum.r*-w.r+z.r; 
sum.i*-w.i+z.ij 
return  sum 
end; 


op  mul(w,z:complex):complex  - 
begin 

declare  prodicomplex; 

prod.r«-w.r*z.r-w.i*z.i; 
prod.i*-w.r*z.i+w.i*z.r; 
return  prod 
and; 

op  create(a,b:real):complex  » 
begin 

declare  c:complex; 

r r*- a;  c.i*-b 

return  c 
end 

end; 


The  body  of  the  definition  contains  some  variable  declarations  (r  and  I)  and  some 
operation  definitions  (add,  mul,  and  create).  When  a variable  of  type  complex  is 
declared,  e.g. 


declare  xxomplex; 


or  dynamically  allocated,  e g. 


complex.create(l,2) 


the  variables  inside  the  definition  of  complex  are  allocated,  just  as  when  an  integer 
variable  is  declared  the  bit  string  by  which  it  is  represented  is  allocated.  From  outside 
the  definition  it  appears  that  x is  an  atomic  entity  to  be  used  with  the  operations 
provided,  e.g. 


complex.add(x,y). 
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From  inside  the  type  definition,  as  within,  say,  the  mcrocode  which  Implements  integer 
arithmetic,  the  structure  of  x is  visible.  In  this  case  the  structure  consists  of  the 
variables  r and  i.  To  access  a particular  structure,  it  Is  necessary  to  specify  which 
structure  by  prefixing  the  variables  with  the  name  of  a complex  variable.  For  example, 
the  use  of  w.r  in  the  body  of  the  at  d operation  refers  to  the  r part  of  the  complex 
object  w which  was  passed  in  as  a parameter.  The  use  of  sum.i  refers  to  the  I part  of 
the  locally  declared  complex  object  sum.  An  implementation  of  this  type  In  8 language 
with  dynamic  storage  allocation  will  be  straightforward,  i.e.  it  will  bo  possible  to  return 
a pointer  to  a newly  allocated  structure  representing  the  result  of  an  operation.  In 
other  languages  it  may  be  necessary  to  write  the  operations  as  subroutines,  passing 
the  destination  variable  as  an  output  parameter.  In  FORTRAN,  storage  allocation  could 
be  simulated  by  declaring  an  array  of  pairs,  each  index  into  the  array  representing  a 
single  complex  value.  The  informal  language  in  which  these  examples  are  presented 
assumes  that  functions  can  return  values  of  arbitrary  types. 

An  example  of  the  use  of  type  complex  might  be: 

declare  x:complex; 

x<-complex.add(  complex.create(l,-2),  complex.create(2,3)  ); 

which  would  result  in  x acquiring  the  value  3+i. 

Note  that  we  were  not  forced  to  represent  complex  values  in  terms  of  their  real 
and  imaginary  parts.  We  are  free  to  re-design  the  type  definition  as  long  as  we 
conform  to  the  stated  specifications  of  the  operations.  For  example,  we  might  decide 
to  use  polar  coordinates  to  represent  a complax  value,  if  that  made  the  operations 
more  efficient.  Although  the  create  operation  is  still  specified  to  produce  the  value 
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a+bi  from  parameters  a and  b,  there  is  nothing  to  prevent  it  from  internally  converting 
to  any  other  representation,  and  there  is  certainly  nothing  to  prevent  us  from  writing 
a different  create  Operation  which  expects  arguments  with  a new  significance.  e.g. 

type  complex  - 

declare  r,theta:realj 

op  create(a,b:real):complex  - 
begin 

declare  c:oomplex; 

c.r*-sqrt(a*a+b*b); 
c.theta<-arcsin(b/c.r); 
return  c 
end 

end; 

In  addition  to  the  allocation  of  the  structure  variables,  when  a variable  of  type 
complex  (or  any  type)  is  declared,  the  create  operation  is  implicitly  invoked.  This  Is 
important  in  order  to  allow  the  data  structure  invariant  relation  (if  there  Is  one)  to  be 
established.  Although  in  this  simple  example  this  is  unnecessary,  It  will  become 
necessary  later  on. 

4)  Data  Structuring 

A comprehensive  presentation  of  data  structuring  methods  In  relation  to  types 
may  be  found  in  (2J.  We  will  discuss  briefly  some  structuring  mechmlsms  In  order  to 
have  some  syntactic  base  upon  which  to  build. 


4.1)  Arny» 


In  the  past,  an  array  has  generally  been  thought  of  as  a table  indexed  by 
Integers,  as  is  the  case  with  a matrix.  A matrix,  however,  is  a type,  having  certain 
operations  such  as  multiplication  and  inversion  defined  for  It.  An  arrav  is  only  a 
structuring* mechanism  and  might  as  well  be  generalized  to  a more  powerful  mapping 
than  is  possible  with  just  integer  domains.  Think  of  the  array  as  a mapping  from  any. 
finite  scalar  type  to  any  type  at  all.  For  example, 

declare  A:array[color]  of  integer; 

declares  a vector  which  is  indexed  by  objects  of  type  color  and  whose  Individual 
elements  are  integers.  We  could  use  such  an  array  to  represent  the  frequencies  of 
v.ie  various  colors  of  light,  e g.  A[red]  - 6485. 

A two  dimensional  array  can  be  modeled  as  a vector  of  vectors,  e.g. 

declare  A:array  [1:10]  of  array[l:5]  of  roal; 
so  that  A[i]  denotes  a row  (or  column,  depending  upon  the  Interpretation). 


4.2)  Rocordr; 

Another  means  of  grouping  values  is  the  record  structure.  The  local  data  of  any 
type  declaration  can  be  thought  of  as  a record.  The  variable  of  the  local  data 

denote  the  fields  of  the  record.  These  fields  are  accessed  as  previously  noted,  by 
prefixing  the  field  name  by  the  name  of  the  complete  object.  While  arrays  are  used  to 
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group  values  of  the  same  type  records  are  used  to  group  values  of  potentially 
differing  type.  Often  records  are  used  to  structure  data  in  a hierarchical  manner,  as  in 
PL/I  or  COBOL.  For  this  facility  we  will  use  the  notation 

declare  r.record  (f1:t1,f2:t2^.,fn:tn); 

to  declare  3 variable  r which  has  n subfields  (r.fj,r.f2,...,r.fn)  of  types  (tj,t2^..,tn) 
respectively.  The  tj  can  be  arbitrary,  including  other  records.  For  example, 

declare  person:record  (nameistring, 

age:integer, 

salary:record  (regular:real, 
overtime:real)); 

The  subfields  of  this  variable  are  person.name,  person.age,  person.salary.regular, 
and  person.salary.overtime.  It  is  possible  to  declare  a table  as  an  array  of  records, 
e.g. 


declare  T:array[l:100]  of  record  (height, weight:integer)j 

in  which  case  the  fields  would  be  accessed  as  T[i).height  and  T[i].weight.  In  general, 
however,  if  a record  is  to  be  used  in  any  non-trivial  way,  it  should  be  embedded  in  a 
type  declaration,  e.g. 


declare  T:array[l:100]  of  medicalrec; 


' ! 


where  medicslrec  defines  the  fields  of  the  record  as  local  data  and  provides  operations 
on  that  data.  Since  it  is  not  unlikely  that  the  structure  of  medicalrec  will  change  at 


some  point  during  system  development,  in  this  way  we  anticipate  modifications  by 
restricting  their  scope. 

4.3)  References 

A reference  value  is  a pointer  to  ?n  object,  with  the  restriction  that  reference 
variables  can  only  reference  objects  of  a single  type.  Typeiess  pointer  variables  are 
the  cause  of  many  program  bugs  since  they  are  free  to  point  to  arbitrary  addresses. 
References  are  much  safer  to  use  and  make  programs  less  devious  In  their  iogic.  The 
reference  concept  is  important  with  regard  to  the  use  of  a dynamic  storage  allocation 
mechanism.  We  postulate  the  operations  new  and  free  for  every  data  type,  but  make 
them  available  for  use  only  inside  of  a type  definition.  This  is  because  of  the  need  for 
initialization  of  the  data  structure  invariant  relation  discussed  In  section  1.  From 
outside  of  a type  definition,  the  create  operation  must  be  used. 

If  c is  a reference  to  complex,  i.e. 

declare  c:ref  complex; 

then,  inside  of  type  complex,  the  statement 

c«-new  complex; 


causes  the  dynamic  allocation  of  a complex  object,  and  assigns  a reference  to  that  new 
object  to  c.  The  statement 


free  c; 
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results  in  the  storage  referenced  by  c being  released.  All  that  we  can  do  with  the 
value  of  c is  to  assign  it  to  other  ref  complex  variables  or  pass  it  as  a procedure 
argument,  since  as  yet  we  have  not  defined  any  operations  for  reference  types.  In 
particular,  we  will  not  allow  statement'  *ch  as 

c<-cv2( 

which  a-e  likely  to  introduce  errors.  More  importantly,  these  statements  violate  the 
notion  of  typo,  since  if  the  address  contained  in  c is  incremented  by  2 it  probably  no 
longer  is  the  address  of  a complex  variable.  We  will  define  an  operator  for  references 
called  dereferencing,  which  converts  e reference  into  the  object  it  references.  If  c 
contains  a valid  reference,  then  ct  is  of  type  complex.  For  exam  ile,  ct.r  is  the  r field 
of  the  complex  object  referenced  by  c.  It  is  not  legal  to  write  c.r,  since  c is  not  of 
tyf.e  complex.  This  imposes  more  structure  on  the  program  specifications  and  prevents 
many  possible  bugs.  Our  examples  assume  that  t applies  to  the  reference  immediately 
to  its  left.  We  will  use  the  reference  value  null  to  indicate  that  no  object  is  currently 
being  referenced. 

5)  A Small  Abstraction 

With  the  tools  just  presented,  we  can  begin  to  describe  some  simple  obstruct 
objects.  Suppose  we  would  like  to  model  the  concept  of  a "bag"  full  of  data,  in  the 
sense  that  once  a datum  is  inserted  in  the  Pag,  it  can  only  be  removed  by  trial  and 
error,  since  "reaching  your  hand"  into  the  bag  is  done  blindly.  Assume  that  the  bag  is 
to  hold  integers. 

We  need  to  provide  certain  key  operations,  namely: 


empty(b:bag):boolean 

put(b:bag,i:i:iteger) 

take(b:bag):integer 
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and  we  would  like  to  guarantee  to  the  external  world  that  a "taken"  element  is  chosen 
at  random.  There  are  several  ways  to  specify  this  type.  One  way  Is  to  use  a vector 
to  represent  the  state  of  the  bag,  although  we  have  to  place  an  upper  limit  on  the  size 
of  the  bag.  Consider: 


type  bag  - 

declare  A.arra)  [1:100]  of  integer, 
top:int»  gerj 


op  create:bag  • 
begin 

declare  b:ref  bag; 


b*-new  bag; 
bt.top«-0; 
return  bt 
er»d; 


So  far  we  have  specified  that  a bag  will  consist  of  a vector  of  length  100  (A), 
and  an  integer  (top)  which  we  will  use  to  indicate  the  current  size.  We  have  specified 
the  create  operation,  since  here  it  is  necessary  to  perform  initialization.  The  invariant 
relation  of  a is  "the  bag  contains  (top)  elements"  so  we  must  set  top  to  zero 
whc,.over  a bag  is  created.  If  we  had  allowed  the  new  operation  to  be  used  outside  of 
he  type  definition,  top  could  remain  unitialized. 

We  can  specify  the  empty  and  put  operations  easily: 
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op  empty(b:bag):boolean  - return  (b.tOp»0); 
op  put(b:bag, integer)  - 

if  b.top«10G  then  error  comment  overflow! 

•lee 

begin 

b.top«-b.top+l; 

b.A[top]«-i 

end; 


Assuming  a uniform  random  number  generator,  take  can  be  specified  as: 


op  take(b:bag):integer  - 

if  b.top-0  then  error  comment  underflow; 

else 

begin 

declare  k,i:integer; 

k«-randintU,b.top); 

i«-b.A[k}, 

b.Aik]*-b.A[top]  comment  fill  in  gap; 
b.to  K-b.top-1; 
retu.  n i 
end 

•no  of  type  bag; 


I 


If  we  desire  to  remove  the  restriction  on  the  size  of  a bag,  wo  can  store  the 
bag  contents  in  a list.  To  define  a list  of  integers,  we  need  to  define  each  node  in  the 
list  and  the  rules  for  connecting  them.  We  will  f rst  specify  type  node,  so  that  we  C8n 
use  its  general  purpose  operations  to  manage  a list: 


I 


I 


1 

« 


i 


I 

1 

! 

1 

: 


IB 


type  nod*  - 

dueler*  valiinteger, 

next:r*f  nod*; 

op  created  integer  ):nod*  * 
begin 

declare  n:r*f  nod*; 

n«-n*w  nod*; 
nt.val«-i; 
nt.next«-null; 
return  nt 
•nd; 

op  destroy(n:nod*)  - fr*«  n; 


The  operation  called  follow  causes  nodel  to  point  at  node2: 

op  ?ollow(nodel,node2:nod*)  ■ nodel.next«-node2; 


Operation  successor  returns  the  current  next  value  of  nodel: 

op  successor(nodel:nod*):nod*  - return  nodel.nextt; 


and  operation  value  returns  the  current  integer  value  of  nodel: 

op  value(nodel:nod*):int*g*r  ■ return  nodel.val 
end  of  type  node; 

A useful  type  of  list  for  the  purpose  of  defining  type  bag  is  a circular  list,  so 
that  a random  chore  from  a bag  is  accomplished  by  "spinning"  the  list  before  deletion. 
A circular  list  of  integers  defined  using  type  nod*  could  be: 


type  diet  - 

declare  current:ref  node, 
sizeiinteger; 

op  create:di*t  ■ 
begin 

declare  c:ref  diet; 

c«-new  clist; 

ct.current*-null; 

ct.size<-0; 

return  ct 
•nd; 


The  insert  operation  maintains  the  proper  circularity: 


op  insert(i:integer,c:elist)  - 

begin 

declare  t,n:ref  node; 

c.size<-c.size+l; 
n*-node.create(i); 
if  c.size=0  then 
begin 

c.current+-n; 

node.follow{c.currentt,c.currentt) 

end 

else 

begin 

t«-node.successor(c.currentt); 

node.follow(c.currentt,nT); 

node.follow(nt,tt) 

end 

end; 


The  circular  list  spin  operation  can  be  defined  as: 


op  spin(c:di>t)  * 
begin 

declare  n;integer; 

n«-randin((0,c.size); 
for  i from  1 to  n do 

c.current*-node.fuccessor(c.currer.tt) 

end; 


and  the  remove  operation  (which  deletes  the  eiement  after  current)  as; 


op  remove(c:cliat):integer  - 
begin 

declare  t.ref  node, 
k:integer; 

if  c.size-0  then  cror  comment  underflow; 
else  if  c.size-1  then 
begin 

k«-node.value(c.currentTh 

node.destroy(c.currentth 

c.current«-null( 

return  k 

end 

else 

begin 

t«-node.successor(c.currentt); 

node.fcllow(c.currentt,node.successor(tt)); 

k«-node.value(tt); 

node.destroy(tt)j 

retu-  n k 

end 

end; 


One  more  useful  operation  will  tell  us  if  the  list  is  empty: 

op  empty(c:clist):boole«n  ■ return  (c.size-0) 
end  of  type  clist; 


Now  type  bag  can  be  simply  defined  as: 
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type  bag  <■ 

declare  cxlist; 


comment  the  create  operation  for  a clist  performs  the 
necessary  initialization  for  a bag; 


op  empty(b:bag):boolean)  - return  clist.emptyfbxfc 


op  put(b:bag,i:integer)  - dist.insert(i,b.c)i 


op  take(b:bag):int#ger  - 
begin 

clist.spin(b.c); 

return  dist.remove(b.c) 

end 


end  of  type  bag; 


Note  that  we  have  completely  re-designed  the  algorithmic  specification  of  type 


bag,  but  left  the  functional  specifications  unchanged.  Any  user  of  type  bag  would  not 


have  had  his  programs  affected  (except  for  the  size  restriction  having  been  removed). 


Note  also  that  the  intermediate  types  node  and  clist  can  be  re-used  in  the  specification 


of  other  list  structures  or  more  abstract  types  such  as  bag  without  the  need  for 


duplication  of  code  or  effort.  In  addition,  the  three  types  make  up  three  modules 


suitable  for  independent  programmer  work  assignments. 


6)  Parameterization 


As  we  mentioned  in  the  introduction,  abstraction  involves  the  noticing  of 


similarities.  This  requires  that  things  which  may  be  potentially  dissimilar  among  a 


group  of  objects  have  a wide  range  of  freedom  of  variation.  The  use  of  parameters  to 


procedures  is  necessary  to  convey  the  notion  of  abstract  operation.  The  "square" 


operation  gains  its  usefulness  because  it  will  square  any  number.  When  types  are 


given  parameters,  they  acquire  much  wider  applicability  in  a similar  way. 


L 


I 
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A parameterized  type  is  called  a type  constructor,  since  the  types  resulting  from 
different  parameter  values  may  be  widely  differing.  Consider  the  definition  of  type 
string  in  terms  of  the  primitive  type  character: 


type  string(n:integ#r)  » 

declare  A:array[l:n]  of  character, 
len:integer  initially  n; 

op  concat(s,t:string):string  - 

begin 

declare  r:string(s.’en+t.len); 

for  i from  1 to  s.len  do  r.A[i]*-s.A[i}, 
for  i from  1 to  t.len  do  r.A[s.len+i]^-t.A[iJ 
return  r 
end 

etc. 

end; 


A type  constructor  may  be  thought  of  as  a macro.  When  its  actual  parameters 
are  specified,  it  completely  defines  a new  type.  In  the  declaration  of  r in  string.concat, 
the  parameter  to  its  type  is  given  as  (s.len+t.len).  A vector  of  this  length  (A)  is 
allocated,  along  with  a word  (len)  containing  this  value.  If  not  for  tne  ability  to 
parameterize  a type  definition*  we  would  have  to  define  a new  type  string  for  every 
possible  string  length! 

Even  more  important  is  the  ability  to  use  type  names  as  parameters  to  type 


i 


definitions.  In  our  previous  example,  we  made  the  unfortunate  (and  irrelevant) 
committment  that  the  objects  contained  in  a bag  must  be  integers.  If,  however,  we  had 
written  the  definition  of  bag  as: 
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typ*  bag  - 

declare  c:clist(integar); 

etc. 

end; 

we  would  be  using  a much  more  general  notion  of  list.  In  fact,  there  is  no  reason  to 
restrict  a bag  to  only  integers: 

type  bag(t:typ«)  - 
declare  cxlist(t); 

etc. 

end; 

with  t replacing  all  occurrences  of  integer.  Type  diet  would  be  defined  in  the  same 
way  as  before,  except  that  the  type  of  the  variable  called  current  would  be 
determined  from  j parameter,  as  would  the  type  of  the  variable  called  val  in  type 
node. 

It  would  not  be  hard  to  implement  the  notions  of  bag  and  clist  in  a language  with 
dynamic  allocation  and  pointer  facilities.  In  other  languages  it  might  be  necessary  to 
re-implement  them  for  every  different  parameter  value  (e.g.  using  arrays),  but  at  least 
the  specifications  do  not  have  to  be  re-done. 

In  general,  parameterization  gives  us  tht  ability  to  precisely  describe  such 
concepts  as  stack,  queue,  tree,  etc.,  without  tying  down  the  types  of  values  which  are 
not  important  to  the  structure  of  the  data. 
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7)  A Larger  Design  Example 

A data  type  having  many  applications  in  systems  programming  Is  the  hash  table. 
For  this  discussion,  we  characterize  a hash  table  by  the  type  of  object  used  to  Index 
the  table,  by  the  type  of  object  stored  in  the  table,  and  by  the  number  of  buckets 
provided: 

type  hashtabletbucketsiinteger, index, result:type)  - 


Each  bucket  contains  a list  of  elements  with  the  same  hash  value.  Each  element  is  an 
(index, result)  value  pair.  Assuming  a type  pair  which  we  can  use  to  group  two  values 
of  arbitrary  type: 


type  pair(left,right:type)  ■ 
declare  Ivahleft, 
rvahright; 

op  leftval(p:pair):left  - return  p.lval; 

op  rightval(p:pair):right  - return  p.rval; 

op  create(l:left,r:right):peir(laft, right)  - 
begin 

declare  p:ref  pair; 

p«-new  pair; 
pt.lvalH; 
pt.rvai«-r; 
return  pt 
end 


end; 


and  assuming  r definition  for, type  list  similar  to  LISP: 


r 
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type  list(t:type)  - 

declare  p:ref  pair(t,list(t))j 

op  head(l:1ist(t)):t  - return  pair.leftval(l.pt); 
comment  LISP  can 

op  tail(l:list(t)):list(t)  ■ return  pair.rightvaKI.pt); 
comment  LISP  cdrj 

op  cons(tv:t,lv:list(t)):list(t)  « 

begin 

declare  q:ref  list(t); 

qHist.create; 
q.p*-pair.create(tv,lv); 
return  qt 
end; 

op  create:list<t)  ■ 
begin 

declare  href  list(t); 

l«-new  list; 

IT.p«-null; 
return  It 
end; 

op  empty(l:list):boolean  - return  (l,p-null) 
end; 


we  can  specify  the  structure  of  a hashtable  as; 


declare  A:a«Tay[l:buckets]  of  listfpairfindex, result)); 


In  order  to  find  a result  value  given  an  index  value,  the  appropriate  array 
element  must  be  found  using  a hash  function.  Then  the  list  must  be  searched  for  the 
ind'iX  value.  Since  the  hash  table  module  knows  nothing  about  the  representation  of 
values  of  type  index,  it  Is  up  to  index  itself  to  provide  that  hash  function.  For  example, 
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in  Ihe  case  of  indexing  by  strings,  we  would  expect  to  find  the  operation  string,  lash 
defined.  We  can  then  define  the  hashtable.fi  nd  operation  as: 


op  find(h:hashtable,i:index):result  - 

return  search(i,h.A[l+(index.hash(i)  mod  upperbound(h.A))])5 


where  search  is  a procedure  (local  to  type  haahtabla  only)  to  scan  the  list: 

procedure  search(i:index,l:li$t(pair(index,result))):reiu!t  - 
begin 

declare  p:list(pair(indax, result)); 

P<-l; 

while  not  list.empty(p)  do 

if  pair.leftval(list.head(p)H  then 

return  pair.rightval(list.head(p» 
else  p<-list.tail(p); 

return  result.undefined 
end; 


Note  that  if  the  search  procedure  does  not  find  the  index  on  the  list,  it  returns 
the  undefined  value  of  type  result  - a convenient  way  to  indicate  an  error  to  the 
caller.  This  of  course  assumes  an  operation  called  "undefined"  for  type  result. 

It  now  remains  to  be  able  to  insert  values  into  the  hash  table.  The  Insert 
function  must  search  the  appropriate  list  and  only  add  to  the  list  if  the  value  Is  not 
already  there: 


9 
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op  insert(h:hashtabl*,i:ind«x,r:r»sult):bool*an  - 
begin 

declare  n:int*g*r; 

n«-l+(ind«x.hash(i)  mod  upparbound(h.A)); 

if  search(i,h.A[n])-rasult.undefined  thon 
begin 

A[n]*-list.cons(pair.create(i,r),A[n]); 

roturn  true 

•nd 

•iso  roturn  (also 
•nd; 


Finally,  the  creation  of  a new  hashtabla  requires  some  initialization: 


op  create:hashtablo  - 
begin 

declare  h:rof  hasbtable(buckeU, index, result); 
h«-n«w  hashtablo; 

for  i from  l to  upporbound(ht.A)  do 
hT.A[i]*-null; 

rotur".  hT 
•nd 

•nd  of  type  hashtable; 


8)  A Directory  System 


To  illustrate  the  power  of  systems  design  using  data  types,  we  have  chosen  as 
our  final  example  a hypothetical  directory  system  as  might  be  found  in  an  operating 
system. 

The  directory  system  to  be  designed  will  be  a hierarchical  one,  in  which  a node 
name  consists  of  a path  through  the  directory  structure  to  the  appropriate  descriptor. 
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For  example,  the  name  A.B.C  refers  to  directory  A,  subdire*tory  B,  descriptor  C.  There 
should  be  no  a priori  bound  on  the  number  of  directories,  nor  on  the  depth  of  the 
hierarchy.  We  will  not  restrict  the  definition  to  that  of  fi]§,  directory,  since  the  notion 
of  file  Is  not  relevant  to  the  structure.  The  purpose  of  the  directory  is  to  associate  an 
object  of  type  descriptor  (maybe  filedescriptor)  with  one  of  type  path.  The  operations 
provided  for  directories  should  include  lookup  and  insertion  functions.  If  we  think  of  a 
path  name  as  a list  of  strings,  we  can  define  type  path  as: 

type  path  - 

declare  l:list(string); 

op  first(p:patb):string  » return  list.head(p.l); 
op  rest(pipath):path  - return  list. tai!(p.!); 
op  empty(p:path):boolean  - return  list.empty(p.l); 
op  terminal(p:path):boolean  - return  path.empty(path.rest(p)) 
end; 

The  directory  itself  will  contain  objects  which  will  either  be  descriptors  or  sub- 
directories. We  will  call  this  type  direntry  for  directory  entry.  The  structure  of  a 
directory,  then,  will  be  a hashtable  (as  previously  described)  which  is  indexed  by  a 
string  and  which  contains  direntry  objects: 

type  directory(descriptor:type)  - 

declare  H:hashtable(  100,string,direntry(descriptor)); 


i 


i 


The  hashtable  will  have  100  buckets,  but  we  could  have  left  that  as  a parameter 


to  directory. 


29 

In  order  to  look  up  a path  in  a directory,  we  must  first  find  the  direntr/ 
corresponding  to  the  path  header  (via  hashtable.find)  and  th  in  look  up  the  rest  of  the 
path  through  the  direntry: 


op  lookup(p:path,d:directory:descriptor)  - 
begin 

declare  t:ref  direntry; 

t*-hashtable.find(d.H,path.first(p)); 

if  tl-direntry.undefined  then 
return  descriptor.undefined 
else  return  direntry.lookup(path.rest(p),tt) 
end; 


Insertion  in  a directory  may  require  the  creation  of  a direntry  and  its  Insertion  In 
the  hashtable,  or  simply  insertion  in  an  existing  dirontry: 


op  insert(p:path, directory, r:descriptor):boolean  - 
begin 

declare  t:ref  direntr/; 
t<-hashtable.find(d.H,path.first(p)); 


if  tT«direntry.undefined  then 
begin 

t«-direntry.create(p,r); 

return  hashtable.insert(d.H,path.first(p),r) 

end 

else  return  direntry.insert(path.rest(p),tT,r) 
end; 


Finally,  the  creation  of  a directory  is  accomplished  by: 


I 


1 
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op  create:directory(descriptor)  - 
(nev>  directory(de*criptor))T 

•nd  of  type  directory; 


Having  finished  the  definition  of  type  directory,  we  must  now  tackle  the 
definition  of  type  direntry.  A direntry  has  a name  (part  of  a path)  and  either  indicates 
a descriptor  or  a sub-directory: 


type  direntry(de$criptor:type)  - 
declare  name:itring, 
node:boolean, 
vaiue:ref  descriptor, 
subdir:ref  directory(deccriplor); 


To  look  up  a path  in  a direntry  which  is  j node  requires  checking  that  the  path 
terminates  at  that  point.  If  the  direntry  refers  to  a sub-directory,  then  the 
directory.lookup  operation  is  recursively  applied: 


op  lookup(p:path,d:direntry):descriptor  - 
if  d.node  then 

if  path.empty(p)  then  return  d.valueT 
else  return  descriptor.undefined 
else  return  descriptor.lookup(p,d.subdirt); 


To  insert  a descriptor  in  an  existing  direntry,  we  must  do  the  insertion  in  the 
sub-directory  (if  there  is  one): 


op  insert(p:path,d:direntry,r:descriptor):boolean  - 
if  d.node  or  path.empty(p)  then  return  feise 
else  return  directory.insert(p,d.stbdirt,rh 
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Creation  of  a new  direntry  (as  required  by  the  directory.insert  operation)  is 
straightforward  when  the  path  is  a terminal.  When  it  Is  not,  creation  of  the  direntry 
requires  creation  of  a sub-directory: 


op  create(p:path,r:descriptor);direntry(descriptor)  - 
begin 

declare  d:ref  direntry(descriptor); 

d+-new  direntry; 
if  peth.terminaKp)  then 
begin 

dT.name*-path.first(p)i 

dt.nodeHrue; 

dt,subdir«-null; 

dt.value«-r 

end 

else 

begin 

dt.name«-path.first(p); 

dt.node«-false; 

dT.value«-null; 

dT.subdir+-directory(descripter).create; 

directory.insert(path.rest(p),dt.subdirt,r) 

end; 

return  dT 

end; 


In  the  end  we  have  a precise  description  of  the  concept  of  directory.  This 
directory  system  may  be  used  in  conjunction  with  a file  system  (note  that  there  is.  a 
difference  between  a file  system  and  the  directory  structure  it  uses,  as  we  have  just 
shown).  It  may  also  be  used  to  store  objects  in  a protection  system,  or  for  any  of  a 
number  of  other  applications.  Each  of  the  types  directory,  direntry,  hashtsble,  etc.  may 
be  viewed  as  a module  suitable  for  a programmer  work  assignment. 
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Summary 

There  is  much  confusion  in  th€  programming  community  today  over  such 
concepts  as  modularity,  structured  programming,  specifications,  etc.  What  we  have 
presented  here  is  a technique  for  employing  all  of  these  tools  in  unified  form  - i.e.  data 
types.  We  have,  of  course,  omitted  discussion  of  many  topics  associated  with  data 
types,  but  have  done  so  on  the  grounds  that  we  are  concerned  with  design  and 
specification,  and  not  coding.  The  freedom  to  use  a specification  language  without 
constraints  of  particular  syntax  is  very  important. 

The  use  Of  data  types  as  a modularization  principle  views  a module  as  being 
responsible  for  the  maintenance  of  some  invariance.  The  explicit  advantage  of  this  Is 
that  system  verification  may  be  done  selectively  by  module.  An  implication  Of  this  is 
that  most  changes  to  the  system  will  be  restricted  to  a small  number  of  modules,  If  not 
a single  one. 
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